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Many network systems are composed of interdependent but distinct types of interactions, which 
cannot be fully understood in isolation. These different types of interactions are often represented 
as layers, attributes on the edges or as a time-dependence of the network structure. Although they 
are crucial for a more comprehensive scientific understanding, these representations offer substan¬ 
tial challenges. Namely, it is an open problem how to precisely characterize the large or mesoscale 
structure of network systems in relation to these additional aspects. Furthermore, the direct incor¬ 
poration of these features invariably increases the effective dimension of the network description, 
and hence aggravates the problem of overfitting, i.e. the use of overly-complex characterizations 
that mistake purely random fluctuations for actual structure. In this work, we propose a robust and 
principled method to tackle these problems, by constructing generative models of modular network 
structure, incorporating layered, attributed and time-varying properties, as well as a nonparametric 
Bayesian methodology to infer the parameters from data and select the most appropriate model ac¬ 
cording to statistical evidence. We show that the method is capable of revealing hidden structure in 
layered, edge-valued and time-varying networks, and that the most appropriate level of granularity 
with respect to the additional dimensions can be reliably identified. We illustrate our approach on 
a variety of empirical systems, including a social network of physicians, the voting correlations of 
deputies in the Brazilian national congress, the global airport network, and a proximity network of 
high-school students. 


I. INTRODUCTION 

The network abstraction has been successfully used as 
a powerful framework behind the modeling of a great va¬ 
riety of biological, technological and social systems [T]. 
Traditionally, most network models proposed in these 
contexts consist of a set of elements possessing a sin¬ 
gle type of pairwise interaction (e.g. epidemic contact, 
transport route, metabolic reaction, etc.). More recently, 
it has becoming increasingly clear that single types of in¬ 
teraction do not occur in isolation, and that a complete 
system encompasses several layers of interactions EHll, 
and very often change in time [5]. Many examples have 
shown that the interplay between different types of inter¬ 
actions can dramatically change the outcome of paradig¬ 
matic processes such as percolation [6], epidemic spread¬ 
ing [7H9], diffusion ^ [11], opinion formation dMii, 
evolutionary games [IMZl, and synchronization mm, 
among others. The realization that different types of 
interaction need to be incorporated into network mod¬ 
els also changes the way data need to be analyzed. In 
particular, the large or mesoscale structure of network 
systems may be intertwined with the layered or tem¬ 
poral structure, in such a way that cannot be visible if 
this information is omitted. The conventional approach 
of representing mesoscale structures is to separate the 
nodes into groups (or modules, “communities”) that have 
a similar role in the network topology m- Some meth¬ 
ods have been proposed to identify such groups in both 
layered [H ED1122] and time-varying uniiiiiiiniHi net¬ 
works. However, these methods do not address two very 
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central questions: 1. Is the layered or temporal struc¬ 
ture indeed important for the description of the net¬ 
work? And if so, to what degree of granularity? 2. How 
does one distinguish between multiple descriptions of the 
same network, and in particular separate actual structure 
from stochastic fluctuations? In this work we tackle both 
these questions by formulating generative models of lay¬ 
ered networks, obtained by generalizing several variants 
of the stochastic block model [29H32] . incorporating fea¬ 
tures such as hierarchical structure [33l|34], overlapping 
groups [351137] and degree-correction [38], in addition to 
different types of layered structure. We show how the 
unsuspecting incorporation of many layers that happen 
to be uncorrelated with the mesoscale structure can in 
fact hinder the detection task, and obscure structure that 
would be visible by ignoring the layer division in the usual 
fashion. Since most methods proposed so far take any 
available layer information for granted, and attempt to 
model it in absolute detail, this issue represents a severe 
limitation of these methods in capturing the structure 
of layered networks in a reliable manner. We show how 
this problem can be solved by performing model selection 
under a general nonparametric Bayesian framework, that 
can also be used to select between different model flavors 
(e.g. with overlapping groups or degree correction). We 
demonstrate that the proposed methodology can also be 
used to infer mesoscale structure in networks with real¬ 
valued correlates on the edges (such as weights, distances, 
etc.), while reliably distinguishing structure from noise, 
as well as change-points in time varying networks [39] . 

This work extends recent developments on layered gDl- 
EU, edge-valued [46]-[49] and temporal [501154] genera¬ 
tive processes, not only by incorporating many impor¬ 
tant topological patterns simultaneously (i.e. hierarchi¬ 
cal structure, degree correction and overlapping groups). 
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but also by tying all these types of model into a non- 
parametric Bayesian framework that permits model se¬ 
lection, and avoids overfitting. The framework presented 
allows one not only to select among all different model 
classes, but also their appropriate order, i.e. the num¬ 
ber of groups, layer bins and hierarchical structure. This 
is done in a principled fashion, based on statistical ev¬ 
idence and the principle of parsimony, and without the 
specification of ad hoc parameters. Furthermore, since it 
is based on the computation of posterior probabilities, it 
can be extended to other probabilistic models. 

This paper is divided as follows. In Sec. |TI| we for¬ 
mulate generative models for layered structure, includ¬ 
ing a very diverse set of possible topological patterns, 
and in Sec. in we describe a Bayesian model selection 
procedure to choose between them based on statistical 
evidence. In Sec. |TV| we tackle the problem of deciding 
whether or not the layered structure is informative of 
the network structure. In Sec. |V| we show how the lay¬ 
ered models can be adapted to networks with real-valued 
edge-covariates, and in Sec. [V^ to networks that change 
in time, for which the division into layers corresponds to 
a detection of change-points. We finalize in Sec. |VII| with 
a conclusion. 


II. GENERATIVE MODELS OF LAYERED 
NETWORKS 

We consider graphs that have a layered structure El El, 
so that the adjacency matrix in layer I G [1,C] can be 
written as A\j (with values in the range [0,1] for a sim¬ 
ple graph, or in N for a multigraph), corresponding to 
the presence of an edge between vertices i and j in layer 
1. We will consider both directed and undirected graphs 
(i.e. A\j being asymmetric and symmetric, respectively), 
although we will focus on the undirected case in most 
of the derivations, since the directed cases are mostly 
straightforward modifications (which are summarized in 
Appendix . Here we assume that the vertices are glob¬ 
ally indexed, and in principle can receive edges in all 
layers. The collapsed graph corresponds to the merging 
of all edges in a single layer, with a resulting adjacency 
matrix Aij = A\j . In the following, we will denote a 
specific layered graph as {Gi} (with Gi = {A\j} being an 
individual layer), and its corresponding collapsed graph 
as Gc = {Aij}. 

In this work we will consider two alternative ways of 
generating a given layered graph {Gi} (see Fig. [^. The 
first approach interprets the layers as edge covariates m- 
First the collapsed graph Gc is generated, and then the 
layer membership of each edge is a random variable sam¬ 
pled from a distribution conditioned on the adjacent ver¬ 
tices. In the second approach, the graphs Gi at each layer 
I are generated independently from each other. (Hence¬ 
forth we call these alternatives simply by “edge covari¬ 
ates” and “independent layers”, respectively). These dif¬ 
ferent generative processes do not exhaust the realm of 




Edge covariates Independent layers 

Figure 1. (Color online) Two processes capable of generating 
layered networks. Left: The collapsed graph is generated first, 
and conditioned on it, the edges are distributed among the 
layers. Right: The layers are formed independently from each 
other. 


possible multilayer models. Instead, the objective here is 
to consider the most basic possibilities that allow us to in¬ 
corporate different types of properties into the generated 
networks, and enable the formulation of a nonparamet- 
ric model selection framework to decide if either one is 
more appropriate than the other depending on the statis¬ 
tical evidence available in the data, as discussed in detail 
below. 

In the following we define two versions of the stochastic 
block model family (SBM), corresponding to the alterna¬ 
tives outlined above. 


A. SBM with edge covariates 

We generate first a collapsed graph from the tradi¬ 
tional SBM ensemble, where N nodes are divided into B 
groups, via the membership vector {bi} G and 

the number of edges randomly placed between groups r 
and s is given by the edge counts Crs (or twice the number 
if r = 5, for convenience of notation). After the graph is 
generated, for each set of edges incident on groups r and 
s, we distribute the layer memberships randomly, condi¬ 
tioned only on the total number of edges of each type I 
between the two groups, Any particular distribu¬ 

tion of covariates among edges incident on groups r and 
s is generated with the same probability, which in the 
case of simple undirected graphs is given by 

( 1 ) 

rUrsl 

where nirs = Y^i'^rs — ~ drs/‘^)^rs- For the multi¬ 

graph case, see Appendix If we use the shorthand 
= {{^rs}^{^i}} the model parameters, the total 
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likelihood of observing the layered graph is 

1 r iryi ^ ^ 

P({G,}|{0}) = P(Geim) n (2) 

TYlrs' 


where P{Gc\{0}) = is the likelihood of the collapsed 
stochastic block model, where St is the microcanonical 
entropy m For instance, for simple undirected graphs 
that are sparse (i.e. with Crs TirUs), we have [55] 





Here we are free to replace the traditional SBM by any 
other flavor, which amounts simply to a different likeli¬ 
hood in the hrst term of Eq. The traditional SBM 
considered above imposes that ah nodes belonging to 
the same group will receive the same number of edges 
on average, with little variation. An important alterna¬ 
tive to this is the degree-corrected stochastic block model 
(DCSBM) [38], that includes as additional model param¬ 
eters the degree sequence of the network, {ki}. As argued 
in Ref. [38], and supported by an empirical model selec¬ 
tion analysis in Ref. 123 , this version is often a better 
model for many (collapsed) networks that feature signif¬ 
icant degree variability. However, in this version with 
edge covarites, only the degrees of the collapsed graph 
are constrained, and thus the edges incident on a specihc 
node will be distributed randomly among the layers in¬ 
dependently of its degree. Hence, for networks generated 
in this manner, nodes with a large collapsed degree will 
also tend to possess uniformly larger degrees in ah layers, 
when compared to other nodes of the same group with a 
lower collapsed degree. In other words, this model does 
not allow for degree variability across layers. 

The complete likelihood of this model can be obtained 
in an entirely analogous fashion, simply by augmenting 
the parameter set in Eq.j^to include the collapsed degree 
sequence, i.e. {0} = and using the 

likelihood of the degree-corrected model m 

Other useful variations are SBMs with mixed mem¬ 
berships (e.g. |3M33), in which nodes are allowed to 
belong to more than one group. Here we use the for¬ 
mulation of Ref. 133, where we need to replace the node 
partitions above by overlapping partitions, {5^}, where 
bi determines the mixture of node i, with G {0,1} 
specifying whether node i belongs to group r, so that 
Likewise, for the degree-corrected 
version, we need to specify the (collapsed) labeled degree 
sequence {ki}, where k} is the degree of node i of type 
r, leading to {0} = {{bi}, {ki}}. In both cases we 

simply replace the likelihood in Eq. by the ones de¬ 
scribed in Ref. 1S3- 


B. SBM with independent layers 

Alternatively, we may generate each layer as an in¬ 
dependent SBM, constrained only by the fact that the 


group memberships of the nodes are the same across 
ah layers (although this can be relaxed in the overlap¬ 
ping version, as discussed below). Eurthermore, we allow 
nodes to belong only to a subset of the layers, by includ¬ 
ing di N X C layer membership matrix {zu}, where each 
binary entry zu G [0,1] determines whether node i be¬ 
longs to layer 1. If a node does not belong to a given 
layer, it is forbidden to receive edges of that type. 

Using the shorthand {{0}i} = {{e^g}} and {0} = {bi}, 
the likelihood of the resulting layered block model is sim- 

ply 
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with P{Gi\{0}i, {(/)}) being the likelihood of the tradi¬ 
tional stochastic block model as before, where Gi is the 
subgraph containing only the edges of layer I and the 
nodes specihed by {zn}. 

Like with the edge covariates model, here we are also 
free to replace the traditional SBM by any other fla¬ 
vor, which amounts simply to different likelihoods in the 
product of Eq. However, differently from the SBM 
with edge covariates, if we wish to include degree cor¬ 
rection, we need to specify the layer-specific degree se¬ 
quence {k\}, where kl = ^jA\- is the degree of node 
i in layer I, so that {{0}i} = {{e{g}, {kl}}. Therefore, 
unlike the previous case, this model allows for degree 
variability across different layers, i.e. a node with a large 
degree in one layer, may possess very low degree in an¬ 
other. Note that given the layer-specihc degree sequence, 
we do not need to distinguish between nodes that belong 
or not to a layer, since a node with a layer-specihc de¬ 
gree equal to zero will inherently not receive any edge 
in that layer. Therefore the parameters {kl} replace the 
parameters {zij}, which are removed from Eq. |^in this 
case. 

We again may wish to use mixed-membership models 
in each layer, by using overlapping partitions as param¬ 
eters, i.e. {(/)} = {bi}. Eor the degree-corrected version, 
we need to specify the labeled degree sequence at each 
layer, {ki}i, where kh is the degree of node i of type r 
in layer I, i.e. {{0}i} = {{e^g}, {ki}i}. We may view the 
labeled degree sequence inside each layer as a weighted 
membership to each group. Since these “weights” may 
change across the layers (even becoming zero), this cor¬ 
responds to a generalization that allows the memberships 
to change arbitrarily between the layers (despite the fact 
that the overall, unweighted group mixtures {bi} are con¬ 
stant across the layers). This is a particularly useful 
property for temporal networks, that allows group mem¬ 
bership to change in time, as discussed in more detail in 
Sec.l^ 
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C. Equivalence between models 

The “independent layers” and “edge covariates” models 
are equivalent in some situations, and different in others. 
In particular, in the non-degree-corrected case described 
above, if all nodes belong to all layers, both models gen¬ 
erate the same networks asymptotically with the same 
probability. This can be seen by employing Stirling’s 
approximation In ~ In in Eq. which 

makes it identical to Eq. Hence, as long as the edge 
counts in each layer are sufficiently large, these models 
are fully equivalent. However, if nodes belong only to spe¬ 
cific subset of the layers, these models are not equivalent. 
In this case, only the model with independent layers will 
take the heterogeneous layer memberships into account, 
and hence it should be preferred. Since we assume that 
the layer memberships are known a priori there is no rea¬ 
son to employ the “edge covariates” non-degree-corrected 
model, since the “independent layers” model will alwws 
provide an equal or better description asymptoticall}0 

The situation is different for the degree-corrected mod¬ 
els. Strictly, both model versions are not equivalent, since 
the layered version allows for degree variability across lay¬ 
ers, whereas the covariate version does not. Hence, there 
are networks generated by the layered model that cannot 
be generated (or only with a vanishing probability) by 
the edge covariates model. The opposite, however, is not 
true: A layered network generated by the covariate ver¬ 
sion can always be sampled with the independent layers 
version given an appropriate parameter choice. 

Since the SBM with independent layers version always 
encapsulates the edge covariate version, one might be 
tempted to prefer it systematically. However, one needs 
to realize that the layered version requires more parame¬ 
ters than the covariates version, either via the layer mem¬ 
bership matrix {zu} or the layer-specific degree sequence 
{kl}. Similar comparisons can be made between specific 
flavors of both models (e.g. with overlapping groups or 
degree correction). Because of the increased number of 
degrees of freedom in the model specification, we risk 
overfitting the data by always choosing the most con¬ 
strained model. We discuss exactly how this choice be¬ 
tween models should be done in the next section. 


III. SELECTING THE MOST APPROPRIATE 
MODEL 

The proper way to select between alternatives is to 
perform model selection based on statistical significance, 
and opt for the more complicated model only if there 
is sufficient evidence available in the data to compen¬ 
sate the larger number of parameters. Eormulated in a 


^ This may change if the layers are not entirely known, and need 
to be determined, as in the case with real-valued covariates in 

Sec.|^ 


Bayesian setting, as proposed in Ref. m, this selection 
procedure amounts to finding the model that maximizes 
the posterior likelihood 

pm\{Gi}) = ( 5 ) 

where {6>} is a shorthand for the entire set of model pa¬ 
rameters (e.g. for the non-degree-corrected SBM with 
edge covariates we have {0} = {e^.^}}), P{{0}) is 

the prior probability on the parameters, and P({G/}) is 
a normalization constant. Since in our context we are 
dealing with discrete parameters, we can write P{{0}) = 
e“^({^}), where jC{{0}) is the microcanonical entropy 
of the parameter ensemble. Therefore, we have that 
- lnP({^}|{GJ) = E + lnP({GJ) with E = 5({GJ) + 
C{{0}) being the description length of the data [56E 
l58] . Hence this approach amounts to finding the model 
that most compresses the observed data, i.e. the one 
with the minimum description length, since to maximize 
P{{0}\{Gi}) is equivalent to minimize E [34l iTfl [59] . 

Here we observe that since the prior probabilities 
are nonparametric, the whole procedure also becomes 
parameter-free, and hence no ad hoc choices are required 
a priori. In particular for the SBM variants considered in 
this work, the partition of the nodes, degree of overlap, 
the number of groups and the hierarchical structure are 
obtained in entirely nonparametric fashion. 

A. Choice of priors 

In order to compute P({6>}), we need to describe gen¬ 
erative processes for the parameter themselves. This 
means that for the model variants above we need to spec¬ 
ify a generative process for the partition into B groups 
{bi}, the layer membership matrix {zu}, the collapsed 
(or layer-specific) degree-sequence {ki} (or {/c-}), and the 
layered edge counts (In the overlapping case, we 

need to do the same for the overlapping partition and la¬ 
beled degree sequences, which we show in Appendix [C|) 

Choosing prior probabilities is a subtle issue, since it 
depends on a priori assumptions about the data, which 
usually depends on context, and often requires domain- 
specific knowledge. In general situations, a prudent ap¬ 
proach is to choose uninformative priors, which do not 
bias the estimation. Here we will take the systematic 
approach of choosing a nested sequence of priors and hy¬ 
perpriors, so that an uninformative prior is chosen only 
at the topmost level [SllET]. This approach is intended 
to minimize the sensitivity of the choice of priors, and 
accordingly provide a shorter description length in the 
majority of cases. 

To generate the partition into groups, we use the pro¬ 
cess described in detail in Refs. [341137] . that corresponds 
to a multilevel Bayesian process, where the distribution 
of group sizes (where is the number of nodes 

in group r) is first uniformly sampled from the set of 
all allowed possibilities, and the partition is distributed 
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uniformly, conditioned of the observed size distribution, 
yielding a description length Cp = — liiP{{bi}) given by 

= In +lnA^! - y^lnn^!, (6) 

r 

where ((^)) = is the total number of m- 

combinations with repetitions from a set of size n. 

For the independent layers model without degree cor¬ 
rection, we need to specify the node memberships to each 
layer. For this, we use the process described in detail in 
Ref. [37] to generate overlapping partitions. We represent 
each line in the {zu} matrix as a mixture vector Zi with C 
binary entries. We formulate a multilevel Bayesian pro¬ 
cess, where the distribution of mixture sizes {rid} (where 
di = z\ is the mixture size of node i, and rid is the 
number of nodes with di = d) is generated from all pos¬ 
sibilities with uniform probability, and the local values 
of di are sampled from this distribution. The mixture 
distribution {n^} (where is the number of nodes be¬ 
longing to mixture z) is also sampled from the set of 
possible choices with uniform probability, conditioned of 
the local mixture sizes {d^}, and finally the individual 
mixtures [zi] themselves are sampled from this distribu¬ 
tion. This yields a description length Cz = —InP{{zi}) 
given by m 

n. = In ((^)) + (((£))) + IniV! - ^Inn,-!. (7) 

d z 

The collapsed degree sequence can be generated with a 
similar Bayesian process, described also in Ref. 1311 , that 
yields a description length = — lnP({/ci}) given by 


where {kiY should be understood as the collapsed degree 
sequence of the graph containing only the edges belonging 
to layer 1. 

Finally, to generate the edge counts we note that 

they can be viewed as the adjacency matrix of a lay¬ 
ered multigraph with B nodes m Therefore, we may 
use the stochastic blockmodel itself to generate it, either 
with independent layers or edge covariates. Since these 
models have their own edge count parameters, this forms 
a nested sequence of SBMs, encapsulating the multilevel 
hierarchical structure of the network, in a fully nonpara- 
metric fashion, yielding a description length as described 
in Ref. [34] . 

L L-l 

Ce = J2 K}'*) + ^ (12) 

h=l h=l 

where is the appropriate entropy of 

the layered SBM in hierarchical level h, and is the 
description length of the corresponding node partition. 

At the top of the hierarchy we have the remaining pa¬ 
rameters {P/}, denoting the number of edges in each lay¬ 
ers. For completeness, they can be easily generated by 

including an uniform prior P{{Ei}) = 5 how¬ 

ever this only adds an overall constant to the description 
length, which is not relevant to any comparisons made in 
this paper. 

To summarize, using the shorthand {0} for the en¬ 
tire set of parameters, we have for each given model (i.e. 
edge covariates and independent layers, with any optional 
combination of degree correction and group overlap) an 
overall description length 


£k = XI > (8) 

r 


S = + (13) 
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with 


jCY^ = InEr + Inn^! - y^ln 


(9) 


where S{{0}) is the appropriate SBM entropy, and jCq 
is the description length of a specific parameter ensem¬ 
ble, chosen from Eqs. to 
appropriate. 
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(and Eqs. Cl to C2), as 


( 10 ) 

B. Confidence levels 


and \nEr ~ 2y^({2)er. 

Eor layered networks, we need a generative process for 
the layer-specific degree sequence, {kl}. Although one 
could in principle construct nonparametric distributions 
that incorporate arbitrary correlations among the degree 
sequences of all layers, the dimension of such distribu¬ 
tions is likely to exceed the evidence available in typical 
data as the number of layers increases. Therefore, here 
we take the simpler route and assume independent dis¬ 
tributions at each layer, so that the description length 
= — \nP{{kl}) becomes simply 

I 


As described above, selecting the model with the small¬ 
est description length X is the appropriate manner of 
balancing model complexity and goodness of fit. How¬ 
ever, often we desire a more refined approach where the 
alternative model can be accepted or rejected with a de¬ 
gree of confidence, in a nonparametric fashion. This can 
be achieved, as proposed in Ref. by inspecting the 
posterior odds ratio m, 

pgeuiGdMPina) 

p{{e}t\{Gi},Ub)P{Ui,) 


(11) 


(15) 
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where P{{0}\{Gi}^l-L) is the posterior according to hy¬ 
pothesis 1-L (i.e. a specific model class), P{l-L) is any 
prior belief for hypothesis and AS = — S5 is the 

difference in description length between both hypotheses. 
For A < 1 we have that Pa is rejected over with a 
confidence that increases as A decreases. Often the val¬ 
ues of A are divided in subjective intervals of evidence 
strength m, as a convention with A = 1/100 being con¬ 
sidered the plausibility threshold, below which Pa is de¬ 
cisively rejected in favor of 7^5, and with A G [1/3,1] be¬ 
ing considered only a negligible difference between both 
models. In the case where there is no preference for ei¬ 
ther model, P{Pa) = P{Pb), the value of A is called the 
Bayes factor m, which has the same interpretation. In 
the following, we will always assume P{Pa) = P{Ph)^ 
and impose A < 1, by always putting the preferred hy¬ 
pothesis in the denominator of Eq. 


C. Inference algorithm 

The description length of a given flavor of the SBM 
given by Eq. is an objective function that needs to 
be minimized with some appropriate algorithm. The 
only known algorithm that is guaranteed to find the 
global minimum is the exhaustive computation of the 
description length for every possible hierarchical parti¬ 
tion of the network, which is unfeasible in any practi¬ 
cal scenario with networks with more than a few nodes 
and edges. Therefore, we must resort to approximate 
methods. Here we employ the multilevel MCMC algo¬ 
rithm described in Ref. [62], together with the hierarchi¬ 
cal generalization presented in Ref. [34|, and the exten¬ 
sion to overlapping groups presented in Ref. [37]. The 
advantage of these algorithms is their good typical run¬ 
ning times, and their capacity to overcome metastable 
states by performing agglomerative moves The divi¬ 
sion of the network into layers does not alter these al¬ 
gorithms in any significant way, other than a straight¬ 
forward book-keeping of the layer membership of each 
edge. In particular, by using appropriate sparse data 
structures that do not change in size if the number of 
layers is increased, the division into layers does not alter 
significantly the typical running times of the algorithms, 
which remain 0{N\v? N) in their greedy versions, in¬ 
dependent of the number of groups B and layers C, and 
hence are applicable to reasonably large networks. An ef¬ 
ficient C++ implementation of these algorithms is freely 
available as part of the graph-tool Python library [65] 
at http://graph-tool.skewed.de, 


^ We note that in principle other algorithms such as belief prop¬ 
agation |63| and spectral clustering m could be used as well, 
provided their are suitably adapted to the nonparametric likeli¬ 
hoods considered here. 



(a) (b) 


Figure 2. (Color online) Artificial network example containing 
an informative layered structure, (a) The collapsed graph 
possesses no discernible structure, i.e. it corresponds to a 
fully random graph, (b) When the division of edges into two 
layers [grey and red (light grey)] is taken into account, a four- 
group structure is revealed. 


IV. WHEN ARE LAYERS INFORMATIVE? 

Layers are informative of the network structure if their 
incorporation into the model yields a more detailed de¬ 
scription of the data, when compared to a model that is 
only based on the collapsed structure of the network. An 
illustration of an informative layered structure is shown 
in Fig.[^ In this example, an artificial network composed 
of two layers is constructed. The collapsed graph corre¬ 
sponds to a fully random network, however the division 
of the edges into layers is such that four fully assortative 
groups exist in one of the layers. Clearly, the layered divi¬ 
sion yields structural information that is not discernible 
in the collapsed graph. This implies that, in more gen¬ 
eral cases, omitting such information on the edges could 
potentially significantly obscure structure present in the 
data [1 [3]. 

However, it is important to realize that the oppo¬ 
site is also true: If the edge distribution into layers is 
uncorrelated with the group divisions, it can also ob¬ 
scure structural information which would otherwise be 
revealed if the layer information were to be ignored. 
This happens because increasing the number of layers 
in the model also increases its effective dimension. If 
the total size and density of the network remains con¬ 
stant as the number of layers increases (and hence the 
effective dimension of the model), the available data be¬ 
come increasingly sparse, which reduces the inference 
precision, since it becomes increasingly difficult to dis¬ 
tinguish signal from noise. An example of this is shown 
in Fig. corresponding to a collapsed B = 2 assorta¬ 
tive SBM with equal-sized groups and edge counts given 
by Crs = 2E[5rsclB + (I — ^^rs)(l — c)/H(H — I)]], with 
c G [0,1] being a mixing parameter, where the edges 
are distributed randomly in C layers. As C increases, 
both model variants (edge covariates, and independent 
layers) display increasing degradation when inference is 
performed, with the detectability transition [66] shifting 
to higher values of c. For the SBM with independent lay- 
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Figure 4. (Color online) Two generative models for a layered 
social network of physicians [67] . (a) Inferred DCSBM for the 
collapsed network, with the edges assumed to be randomly 
distributed among the layers, (b) Inferred DCSBM with edge 
covariates, where each layer corresponds to one type of ac¬ 
quaintance. Below each figure is shown the posterior odds 
ratio A, relative to preferred model (a). The circular layout 
with edge bundling [68] represents the inferred node hierarchy 
(indicated also by the red nodes and edges), as explained in 
the text (see also Ref. [34]). 


large-scale structure of a network, or if it needs to be 
coarse-grained or even discarded. This can be done by 
considering a null model where the edges are distributed 
among the layers in a manner that is entirely independent 
of the group structure, and is parametrized only by the 
total number of edges in each layer, {Ei}. Let us use 
the shorthand {0} for the possible set of parameters of 
a collapsed SBM. This null model has a likelihood given 
simply by 


Figure 3. (Color online) An excessive number of layers can 
obscure network structure. Top: A collapsed two-group struc¬ 
ture is generated, and the edges are randomly distributed in 
C layers. Middle and Bottom: As the number of edges per 
layer E/C diminishes, the structure inside each layer becomes 
increasingly sparse, and the overall quality of the inference 
worsens. The middle panel shows the normalized mutual 
information (NMI) between the planted and inferred parti¬ 
tions, using the SBM with independent layers, for a network 
of A" = 10^ nodes and average degree (k) = 2E/N = 14 as a 
function of the mixing parameter c, as described in the text. 
The bottom panel is the same as the middle one, but using 
the SBM with edge covariates. In both cases the vertical lines 
mark the detectability transition point for the collapsed SBM, 
c* = 1/B + (B - l)/(Bv^) Ell¬ 


ers, the transition shifts to c* ^ 1 as C ^ A, and in this 
limit no information at all on the graph structure can 
be inferred. The version with edge covariates displays a 
relatively superior performance, with the transition re¬ 
maining at c* < 1 for C ^ A, since it is conditioned on 
the collapsed graph. Nevertheless, even in this case the 
degradation caused by increasing C is very noticeable. 

Because of this problem, it is important to consider 
if we indeed need the layered structure to describe the 


P{{Gi}\{d}, {El}) = P(Ge| W) X (16) 

where the first term is the likelihood of the collapsed SBM 
and the second accounts for the random distribution of 
edges across the layers (the above equation is valid only 
for simple graphs; for multigraphs see Appendix]^. The 
full posterior and its corresponding description length are 
computed just as before, by including the priors for {crs}, 
{bi}, {6}, {ki} and {ki}. We can then compare the de¬ 
scription length of this null model with any of the other 
layered variants, and decide if there is enough evidence 
to justify the incorporation of layers that are correlated 
with the group structure. 

As a concrete example, here we consider an empirical 
social network of A = 241 physicians, collected during a 
survey m Participants were asked which other physi¬ 
cians they would contact in hypothetical situations. The 
questions asked were: 1. “When you need information or 
advice about questions of therapy where do you usually 
turn?”, 2. “And who are the three or four physicians with 
whom you most often find yourself discussing cases or 
therapy in the course of an ordinary week - last week for 
instance?”, 3. “Would you tell me the first names of your 
three friends whom you see most often socially?”. The 
answers to each question represent edges in one specific 

















layer of a directed network. If one applies the DCSBM to 
the collapsed graph (which provides the best fit among 
the alternatives), it yields a division into B = 9 groups, 
as shown in the left panel of Fig. including also a divi¬ 
sion into three disconnected components (corresponding 
to different cities). Between the layered SBM versions, 
the model with edge covariates that turns out to be a bet¬ 
ter fit to the data (i.e. yields a lower description length) 
and divides the network into B = S groups, as shown in 
the left panel of Fig.|^ When inspecting the edge counts 
visually, one does not notice any significant difference 
between the patterns in each layer. Indeed, when com¬ 
paring the description lengths between the null model 
with random layers above and the SBM with edge co¬ 
variates, we find that the latter is strongly rejected with 
a posterior odds ratio A « 10“^^. Therefore, there is 
no noticeable evidence in the data to support any cor¬ 
relation of layer divisions with the large-scale structure 
present in the graph. This suggests that the important 
descriptors of this social network are mainly the overall 
acquaintances among physicians, not their precise types 
(at least as measured by the survey questions). 

We now turn to another example, where informative 
layered structure can be detected. We consider the vote 
correlation network of federal deputies in the Brazilian 
national congress. Based on public data containing the 
votes of all deputies in all chamber sessions across many 
year^ we obtained the correlation matrix between all 
deputies. We constructed a network by connecting an 
edge from a deputy to other 10 deputies with which 
she is most correlated in the considered periocQ We 
then separated the network in two layers, correspond¬ 
ing to two consecutive four-year terms, 1999 — 2002 and 
2003 — 2006. Deputies not present during the whole pe¬ 
riod were removed from the network, yielding a network 
with N = 224 nodes and E = 7247 edges in total. When 
fitting the DCSBM for the collapsed network (which is 
again the best model), we obtain the B = 11 partition 
shown in the left panel of Fig. It shows a hierarchical 
division that is largely consistent with party and coalition 
lines, as well as positions in the political spectrum (with 
a noticeable deviation being a group of left-wing parties 
composed by PDT, PSB and PCdoB being grouped to¬ 
gether with center-right parties PTB and PMDB). When 
incorporating the layers, the best model fit is obtained 
by the DCSBM with independent layers, which yields a 
B = 11 division mostly compatible with (but not fully 
identical to) the collapsed network, although with a dif¬ 
ferent hierarchical structure, as can be seen in the right 
panel of Fig. However, the layered representation of 
this network reveals a major coalition change between 
the two terms, consistent with the shift of power that oc¬ 
curred with the election of a new president belonging to 


^ Available at http: //www. camara. gov. br/ 

We experimented with other threshold values, and obtained sim¬ 
ilar results. 


the previous main opposition party: In the 1999 — 2002 
term we see a clear division into a government and oppo¬ 
sition groups (as captured in the topmost level of the hi¬ 
erarchy), with most edges existing between groups of the 
same camp, corresponding to a right-wing/center govern¬ 
ment led by the PSDB, PMDB, PFL, DEM and PP par¬ 
ties, and a left-wing opposition composed mostly by PT, 
PDT, PSB and PCdoB. After 2002, we observe a shifted 
coalition landscape, with a left-wing/center government 
predominantly formed by PT, PMDB, PDT, PSB and 
PCdoB, and an opposition led by PSDB, PFL, DEM 
and PP. Because of this noticeable change in the large- 
scale network structure — that is completely erased in 
the collapsed network — the null model with random 
layers ends up being forcefully rejected with A « 10“^^^, 
meaning that the layered structure is very informative on 
the network structure. 

In the above examples we made a comparison between 
the layered model and a null model with fully random lay¬ 
ers. In some scenarios we might be interested in a more 
nuanced approach, where the layers are coarse-grained 
with a more appropriate level of granularity. This can be 
done by merging some of the layers into bins, such that 
inside each bin the layer membership of the edges is dis¬ 
tributed regardless of the group structure. Let i specify 
a set of layers that were merged in one specific bin, and 
be a shorthand for the possible set of parameters 
of a layered SBM {Gi} (with independent layers or edge 
covariates) where each bin i corresponds to an individ¬ 
ual layer. The likelihood of this model conditioned on a 
specific bin set {£} is is given by 


= Pi{Ge}mw) x n 

(17) 

where Ei — Ei is the number of edges in bin i (the 
above equation is valid only for simple graphs; See Ap¬ 
pendix for the more general case with parallel edges). 
When considering the full posterior, we need to include 
the priors for {0}^£y as before, but also for the binning 
{£} itself. If the layers can be grouped arbitrarily, we 
have 


pm) = 


Yle^e'- 

C! 


X 



-1 


(18) 


where n£ is the number of layers in bin £ and M is the 
total number of layer bins. If the layers are inherently 
ordered, and thus can only be contiguously binned, this 
becomes instead simply 


P(W) = ((“)) 


(19) 


If we make M = 1 we recover the original null model 
above. Algorithmically, one can find the appropriate bins 
in a variety of ways. A simple approach is to use agglom- 
erative hierarchical clustering, i.e. by putting at first each 
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Figure 5. (Color online) Network of vote correlations among federal deputies of the Brazilian national congress during two 
consecutive four-year terms, 1999 — 2002 and 2003 — 2006. (a) DCSBM fit for the collapsed network obtained by merging 
both terms, corresponding to a null model where the edges are randomly distributed between the layers. The group labels 
correspond to the predominant parties inside each group, determined after the inference had been performed (the size of the 
label indicates the proportion of each party inside the group), (b) DCSBM with independent layers for the network divided 
into two terms. In both cases is shown the posterior odds ratio A relative to the best model [in this case (b)]. The layout is 
the same as in Fig. 


layer in its own bin, and subsequently merging bins ac¬ 
cording to the reduction of the overall description length. 
We explore this idea further in Sec [V| when dealing with 
real-valued edge covariates. 


A. Layers as evidence for overlaps 

There is an important correspondence between layered 
networks and overlapping structures of collapsed net¬ 
works. Namely, the inference of overlapping structures 
in collapsed graphs can to some extent be interpreted 
as the inference of latent layers HOI to which the edges 
belong, where each (connected) group pair (r, s) would 
correspond to a different layer. Because of this corre¬ 
spondence, any a priori knowledge of the division into 
layers can fundamentally alter the interpretation of the 
data in situations where a nonoverlapping model would 
otherwise be considered a better fit for the collapsed net¬ 
work [37] , 

This is better understood by considering the following 
generative process as an example: A network is gener¬ 
ated with C layers, where in each layer E/C edges are 
randomly placed between the nodes that belong to that 
layer. The layer membership mixtures are parameterized 
as oc Yli , up to a normalization constant, and with 
/i G [0,1] controlling the degree of layer overlap: For 
/i —> 0 we obtain asymptotically nonoverlapping layers 
with Til = N/B nodes at each layer /, and for /i = 1 all 
mixtures z have the same size. This process corresponds 


to a layered SBM with only one group, 5 = 1, and the 
aforementioned layer structure. If we consider only the 
collapsed graph, with the layer information removed, the 
corresponding topology can be generated in two alterna¬ 
tive ways: 1. An overlapping SBM with B = C groups 
and mixtures bi = Zi, and edge counts Crs = 2E5rs/B. 2. 
A nonoverlapping SBM with each individual mixture as 
its own group, indexed by G [1, 2^ — 1], 

resulting in a total of 5 = 2^ — 1 groups, and edge counts 
given by 






TlirpThg 


Tl/qr-, Tl/f-, . 


( 20 ) 


The description length of the collapsed graph generated 
with the layered model is 

E, = 2E-Elnj^+C,{{n,{^i)}), ( 21 ) 

which is in fact identical to the overlapping SBM, cor¬ 
responding to C ^ B and in the above 

equation. The nonoverlapping model, on the other hand, 
has a description length given by 





( 22 ) 


where corresponds to a nonoverlapping par¬ 

tition of individual mixtures. As discussed in Ref. m, 
we may have < Ef if the number of nodes at the 
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intersections is sufficiently large. Therefore the nonover¬ 
lapping model may indeed be considered the most par¬ 
simonious of the three in that case, which is arguably 
non-intuitive, since the overlapping SBM seems closer to 
the original model. However, the situation changes when 
the observed data includes the layer information on the 
edges. In this case, we must include the random division 
of the edges into layers in the two collapsed models, by 
adding, according to Eq. the following term to the 
description length: 

hiE\-^\nEi\ = \nE\-C\nE/C\. (23) 

I 

Because of this difference, the layered model with B = 1 
becomes always the preferred choice (see Fig.|^. There¬ 
fore, when edge information is available, it can signifi¬ 
cantly change which model is preferred, and tip the scale 
towards the overlapping description. However, we em¬ 
phasize that this extra information does nothing regard¬ 
ing the decision between both collapsed models; it only 
supports the acceptance of the third layered variant. 

It is important to consider the above comparison to¬ 
gether with the results of Ref. m, which showed that 
the overlapping variants of the SBM are seldom the best 
fit for the majority of empirical networks used for that 
work, which contained no layer information. As the ex¬ 
ample above shows, this assessment may change (at least 
in principle) if any division among the edges can be as¬ 
sumed a priori. Therefore, for a fair assessment of the 
best generative process, it is imperative to leverage all 
available information, in particular the division into lay¬ 
ers, or the existence of edge covariates. 


V. EDGES WITH REAL-VALUED 
CORRELATES 

The models discussed so far are capable of generat¬ 
ing data with discrete values associated with the existing 
edges. However, in many important situations the val¬ 
ues associated with edges are real values, corresponding 
to weights, distances, capacities, etc. Here we show how 
the previous models can be straightforwardly adapted to 
these cases as well, using a discretization approach. As 
before, we simply assume that the graph is divided into 
C discrete layers, however we ascribe to each layer I a 
real value x/, randomly sampled from a PDF p(x), such 
that all edges in the same layer possess the same edge 
correlate. In the case that all edges have a different cor¬ 
relate, we will have C = E layers. Like in Sec. |IV| we 
assume that the layers themselves are grouped into bins 
with being a shorthand for the possible set of 

parameters of a layered SBM (with independent layers or 
edge covariates) where each bin I corresponds to an 
individual layer. The whole PDF of the data generated 



Figure 6. (Color online) Top left: Description length per 
edge E/F for the collapsed planted partition model described 
in the text as a function of the overlap parameter /x, with 
N = 10^, {k) = 2E/N = 10 and F = 4 (illustrated in 
the lower left panel). The two curves show the description 
length of the planted overlapping model, and the equivalent 
non-overlapping model with 2^ — 1 groups (illustrated in the 
lower middle panel). Only for values of p below the inter¬ 
section point the original overlapping model is preferred over 
the nonoverlapping one. Top right: The same as in the top 
left, but with layer information included. The third curve 
corresponds to a F = I model with C — A independent layers 
(illustrated in the lower right panel), whereas the first two 
curves correspond to the same collapsed models as in the left 
panel, but with a random distribution of edges in the C — A 
layers. The model with independent layers is preferred over 
the alternatives in the entire parameter range. 

in this manner becomes 

(24) 

where the first term is given by Eq. The advantage 
of this approach is that the overall correlate PDF p{xi) 
amounts to constant multiplicative factor in the likeli¬ 
hood, independent of our choice of bins, and therefore 
cannot influence either the maximum likelihood estimate 
or the maximum of the posterior distribution, and there¬ 
fore for these purposes we can avoid specifying it alto¬ 
gether. This contrasts with another generalization of 
the SBM for real-valued covariates proposed in Ref. EH, 
which requires the exact form of the correlate distribu¬ 
tion to be specified prior to inference (on the other hand, 
the approach presented here is based the discretization of 
the correlates into bins, whereas in Ref. m no binning 
is necessary). 

In order to choose the best number of layers, we maxi¬ 
mize the posterior P(^|^}, {(^}\{Gx})^ which involves the 
priors of the SBM parameters, as well as for the bins 
as given by Eq. Therefore, both the number and the 
boundary positions of the bins can be determined in a 
nonparametric manner, based only on the data. 
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Figure 7. (Color online) Global airport network of openf lights. org Top left: Distribution of edge distances. The bins labeled 
from (a) to (e) correspond to the best division of the edges into layers according to the method described in the text. Top 
right: Spatial distribution of airports. The colors correspond to the division of the network into groups, according to the best 
fit of the DCSBM model with independent layers (the same color coding is used in the remaining panels). Bottom: Individual 
layers of the DCSBM fit, corresponding to the bins in the top panel. The layout is the same as in Fig.[^ 


As an example we consider the global airport network 
as collected by openflights.org, This is a directed 
multigraph, where the N = 3253 nodes are airports and 
the E = 67154 edges represent existing flights. Since 
the position of the airports is known, we can character¬ 
ize the edges by their geodesic distance, which we treat 
as a covariate. In applying the DCSBM with indepen¬ 
dent layers, using the method outlined above to find the 
optimal binning of the distances, we find a division into 
B = 34 groups, and M = 5 distance bins, as shown in 
Fig.Q When inspecting the spatial distribution of air¬ 
ports, we observe that the obtained groups correspond 
to fairly contiguous geographical regions (see Fig. top 
right). The distribution of edges across the layers reveal 
a hierarchical organization strongly correlated with flight 
distance: The first layer captures local “intra-groups” 
with relatively short distance, whereas the upper lay¬ 
ers capture increasingly “inter-groups” flights with longer 
distances. The nodes with large degree tend to be those 
that belong to multiple layers, i.e. major airport hubs 
that service both short and long-distance flights. 


VI. TIME-VARYING NETWORKS 


Temporal networks can be viewed as a special case 
of networks with real-valued edge correlates representing 
their existence at a specific time, Xi = ti, and hence we 


can use the same approach as in the previous sectiorj^ By 
using the different model versions presented in this work, 
different types of temporal patterns can be captured. In 
all cases, by separating the network into time-bins, it is 
assumed that inside each bin the edges are placed be¬ 
tween the groups in a random fashion, conditioned only 
on the group membership of the receiving nodes. When 
using the SBM with edge covariates, the nodes are as¬ 
sumed to belong to all time layers, and as such can receive 
edges at all times, depending only on the activity of the 
entire group at any give time. On the other hand, the ver¬ 
sion with independent layers allows for a individualized 
placement of the nodes into the layers (independently of 
their group membership) such that their activity may be 
separately regulated. The activity inside each layer can 
be even more fine-tuned in the degree-corrected model 
with independent layers, since the degree of each node 
at each time window is separately specified. In all these 
examples, the group memberships are forced to be stable 
in time. This can be changed by using an overlapping 
SBM [37], where the group memberships (which are in 
this case attributes of the half-edges of the graphs) can 
change arbitrarily in time. As before, given some empiri¬ 
cal observation, the most appropriate model choice is the 
one with the minimum description length. 

The discretization approach presented here is similar in 


^ Other formulations of temporal networks are possible. For in¬ 
stance, one could attribute to each edge a tuple Xi = {t^,t^), 
containing a creation and deletion time, respectively. The ap¬ 
proach presented here can be adapted to such a multivariate case 
in a straightforward manner, by using multidimensional bins. 
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Figure 8. (Color online) Proximity network between high-school students [69Q Top: Network activity (i.e. probability density 
of an edge being present) as a function of time, over a period of one day. The bins labeled from (a) to (j) correspond to the 
best division of the edges into layers according to the method described in the text. Bottom: Individual layers of the DCSBM 
fit, corresponding to the bins in the top panel. The layout is the same as in Fig.[^ 

^ Retrieved from http://sociopatterns.org 


spirit to the detection of “change points” in networks [39]. 
Since it is assumed that inside each time window the 
edges are placed in a manner that is independent of their 
time relative to one another, the most appropriate time 
binning is the one that partitions the time series in such a 
way that inside each time window the large-scale network 
structure does not change significantly. The interface 
between two bins can therefore be interpreted as change 
points where the large-scale structure has changed in a 
measurable and statistically significant way. 

Here we show an application of this method to a time- 
resolved proximity network between N = 126 high-school 
students, recorded over a period of four days in 2011 |69| . 
of which we isolated only the first day to simplify the 
analysis. In this experiment, volunteering students wore 
proximity sensors during school hours, which recorded an 
edge and its time if two students were below a distance 
threshold for a pre-specified amount of time. If we ap¬ 
ply the DCSBM with independent layers to this dataset 
(again providing a better fit), the best partition is found 
for B = 33 groups, and the whole time series was di¬ 
vided into M = 10 periods, as can be seen in Fig. 


The hierarchical partition is in accordance with the ex¬ 
istence of three classes, as can be seen in the first levels 
of the hierarchy. Each period marks a region in time 
where a distinct large-scale structure is observed. These 
periods alternate between those with high activities and 
those with a relative quiescence, presumably represent¬ 
ing breaks (with many edges between classes, and a per¬ 
ceived synchrony between the PC and PC* classes) and 
class periods (with few edges between classes), respec¬ 
tively, although this information is not available in the 
dataset. 

In the above example, the best fit was obtained for a 
nonoverlapping SBM, implying that the group member¬ 
ships remain stable in time. However, in some situations, 
movements between groups can be inferred. As an exam¬ 
ple, we return to the network of vote correlations of the 
Brazilian national congress. Differently from before, now 
we inspect a single four-year term from 2007 to 2010, and 
we separate each year into one layer, yielding a network 
with N = 475 nodes and E = 9053 edges in total. In this 
case, a best fit is obtained for an overlapping DCSBM 
with independent layers and B = 12 groups, as seen in 
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Fig-i The hierarchical division clearly separates be¬ 
tween a center-left government coalition (the largest top¬ 
most branch) and the right-wing opposition (the smallest 
topmost branch). In the government branch, we observe 
the existence of many “peripheral” deputies, which are 
not strongly correlated with each other, and instead are 
aligned with smaller groups of more connected nodes, 
which are divided mostly along party lines. This prop¬ 
erty is weakened in the later years of the term, as more 
edges are observed between peripheral deputies. The 
overlapping structure found is correlated strongly with 
the layered divisions, such that by observing only one 
layer in isolation, no overlaps are present. Therefore, a 
fraction of the deputies seem to completely change their 
alignment patterns in successive years, as shown in the 
bottom of Fig.[^ The flow between groups is mostly con¬ 
fined to either the government or opposition groups, with 
the majority of the activity occurring inside the govern¬ 
ment faction. Although some deputies did change their 
party affiliation during this period, the observed flows 
seem mostly uncorrelated with this, and instead appears 
to show a more fine-grained alignment between deputies 
that is not uniquely defined by their party membership. 


VII. CONCLUSION 

We presented a framework for the nonparametric in¬ 
ference of mesoscale structures in layered, edge-valued 
and time-varying networks, based on a variety of mod¬ 
ifications of the stochastic block model, incorporating 
features such as hierarchical structure, degree-correction, 
and overlapping groups. These models were formulated 
in a Bayesian setting, that allows the identification of 
the most appropriate model variant based on statistical 
evidence, corresponding to a principled balance between 
model complexity and quality of fit. 

We have identified an important pitfall when analyz¬ 
ing network data with layered structure, where the in¬ 
clusion of many layers that are uncorrelated with the 
mesoscale structure can obstruct its identification. This 
problem cannot be neglected if the number of layers be¬ 
comes large, as in the case of temporal or edge-value net¬ 
works where the layers correspond to arbitrary bins of 
the edge covariates. We expect this problem to affect 
also non-statistical methods based on modified modular¬ 
ity maximization H Eoi HD ng ig, as well as flow com¬ 
pression m and non-negative tensor factorization m- 
In our setting, we have shown how this can be com¬ 
pletely avoided by comparing the inferred model with 
a null model that assumes that the layers are uncorre¬ 
lated, or with a coarse-grained version that condenses 
uncorrelated layers into bins. 

We also showed how this framework can be extended 
in a straightforward manner to networks with real-valued 
attributes on the edges, and temporal networks. The 
proposed methodology is capable of identifying specific 
scales — both of the edge values and in time — where 
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Figure 9. (Color online) Network of vote correlations among 
federal deputies of the Brazilian national congress in the four- 
year term from 2007 to 2010. The top panel shows the B = 
12 division obtained by fitting an overlapping DCSBM with 
independent layers, with all layers collapsed into one figure. 
The group labels correspond to the predominant parties inside 
each group. The individual layers can be seen in the middle 
panel. The bottom panel shows the flows of deputies between 
each group after each year. The edge thickness corresponds to 
the amount of deputies, with the largest flow corresponding 
to 10 deputies, and the smallest 1 deputy. 


the mesoscale structure does not change significantly, en¬ 
abling the identification of the most appropriate coarse- 
graining of the network in discrete layers, as well as the 
detection of “change points” of the network structure. 

The unsupervised inference of the most parsimonious 
layered model, as well as the appropriate granularity of 
the layers, based solely on statistical evidence and re¬ 
quiring no ad hoc parameters, provides a principled and 
robust method to analyze multilayer, temporal and edge¬ 
valued network data. This approach is likely to be di¬ 
rectly useful in a variety of tasks, such as the nonpara¬ 
metric modeling of correlation networks m, the pre- 
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diction of missing valued edges ga EH], the identifica¬ 
tion of relevant time scales in temporal networks izoi, 
and its relation to dynamical processes taking place on 
them diEa, among many others. 
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Appendix A: Multigraphs 

For multigraphs, we need to consider that parallel 
edges that belong to the same layer are indistinguish¬ 
able. Hence the likelihoods of Eq.j^must be corrected to 
read 

I I 


p({Gai w) = p(G,i w) n 

^ Tllrpa 1 


r<s 


^i>j - Yli ^u/2! 


(Al) 


The last term does not depend on the SBM parameters. 
Therefore, when doing inference, the difference amounts 
to multiplicative constant which does not alter the posi¬ 
tion of the most likely network partition, and thus could 
in principle be discarded. However, this difference is im¬ 
portant when comparing models with a different number 
of layers, as will be done below. 

For the independent layers model, it suffices to use 
the appropriate multigraph likelihood in each layer, as is 
given in Refs. [37l[55] . 

Likewise, when considering the null model of Sec. 
the existence of parallel edges must also be accounted for. 
Therefore Eq. must be modified to read 


P{{Gi}\{d},{Ei}) = P{G,\0) X 
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In the case of binned layers, it must be analogously mod¬ 
ified to read 
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Appendix B: Directed graphs 

Directed graphs represent straightforward modifica¬ 
tions of the models presented in the main text. Eor the 
collapsed likelihoods and priors, we refer the Refs. [371 
[55]. 

Eor the model with edge covariates and the possibility 
of multiple edges, the total likelihood of Eq. becomes 
simply 


n{a}|{9}) = -P(Gj{e»nyH ^ 

(Bl) 

And again, for the independent layers model, it suffices 
to use the appropriate directed likelihood in each layer, 
as is given in Refs. [371155]. 

Likewise, when considering the null model of Sec. |IV[ 
for directed graphs (with possible multiple edges) Eq. |16| 
must be modified to read 


P{{Gi}\{0},{Ei}) = P{G,\0) x'^ 
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and in the case of binned layers, 
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Appendix C: Model selection for overlapping groups 

In the case the SBM with overlapping groups, we need 
to specify a generative process for the overlapping par¬ 
tition into B groups {5^}, and the collapsed (or layer- 
specific) labeled degree-sequence {ki} (or {^^}). 

To generate the overlapping partition into groups, 
we use the hierarchical process described in detail in 
Ref. [37], already described in the main text adapted 
to the generation of the layer-membership matrix {zu}^ 
which yields Cp = —\iiP{{bi}) given by 

y = In ((^))((S))+ 

d 5 

(Cl) 

where D < B is the maximum mixture size d. The case 
without group overlaps amounts to D = 1, reducing it to 

Eq.|^ 

The collapsed overlapping degree sequence can be gen¬ 
erated with a similar Bayesian process, described also 
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in Ref. [37], that yields a description length C,^ = where InS^ 2^({2)e^. For the case without overlaps 

- In P{{ki}) given by this reduces to Eq. i The edge-specific overlapping de- 

.... . . gree sequence is obtained according to Eq. pTl 

^ b 

with 

r^b k 
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